Skip to content

chore(ci): remove NATIVE=1 path; migrate all CI to Podman containers (#501)#568

Open
mvillmow wants to merge 13 commits into
mainfrom
501-impl
Open

chore(ci): remove NATIVE=1 path; migrate all CI to Podman containers (#501)#568
mvillmow wants to merge 13 commits into
mainfrom
501-impl

Conversation

@mvillmow

Copy link
Copy Markdown
Collaborator

Summary

  • Removes the NATIVE=1 / %.native: escape hatch from the Makefile entirely; all builds now run unconditionally via podman compose exec -T dev
  • Wraps the deps target with CONTAINER_CHECK/CONTAINER_PREFIX so make deps installs Conan inside the container (closes AC1 from the issue)
  • Adds cap_add: [SYS_PTRACE] and security_opt: [seccomp=unconfined] to the dev service in docker-compose.yml for sanitizer builds (ASan/TSan/LSan/UBSan)
  • Creates .github/actions/podman-setup composite action: installs podman-compose, restores/builds the dev image via actions/cache@v5 keyed on Containerfile hash, starts the container, and smoke-tests exec
  • Calls podman-setup at the end of install-build-deps (new setup-podman input, default true) so every CI job has a running container before any make step
  • Replaces all make X.native calls in _required.yml (13), extras.yml (8), and release-please.yml (2) with plain make X
  • Updates README, PR template, and docs/CICD_COVERAGE.md to remove all .native references

Plan Divergences (noted per review comment)

Divergence Resolution
Plan referenced podman-compose.yml (does not exist) Used actual docker-compose.yml; podman compose finds it automatically
Plan omitted CONTAINER_PREFIX wrapping on deps target Added — this was a critical gap flagged in the plan review
GIT_COMMIT/BUILD_UID/BUILD_GID not exported before compose up Exported in podman-setup action before podman compose up -d dev
Plan contradicted itself (new action vs. inline) Created separate podman-setup action (SRP) with a single setup-podman input on install-build-deps
test-all Containerfile stage Not added (YAGNI — no existing reference, no acceptance criterion)
CONTAINER_CHECK had || true (masked failures) Removed — failure now propagates cleanly

Acceptance Criteria

  • make deps (no .native) runs inside container in CI
  • All required workflow jobs use CONTAINER_PREFIX = podman compose exec -T dev
  • NATIVE, ifeq (NATIVE,1), and %.native: do not appear in the Makefile
  • Container exec smoke test passes in podman-setup action
  • Sanitizer builds have SYS_PTRACE capability in docker-compose.yml

Closes #501

🤖 Generated with Claude Code

Comment thread include/concurrency/logger.hpp
Comment thread include/concurrency/logger.hpp

@mvillmow mvillmow left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed to parse structured output from analysis

@mvillmow mvillmow left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOGO: CI never builds/starts the dev container; GIT_COMMIT/BUILD_UID/BUILD_GID unset so exec hits a missing/garbled container; host/container build split defeats the goal.

Comment thread .github/actions/install-build-deps/action.yml Outdated
Comment thread .github/actions/install-build-deps/action.yml
Comment thread .github/workflows/_required.yml
Comment thread .github/actions/install-build-deps/action.yml
mvillmow added a commit that referenced this pull request Jun 19, 2026
…s, lint boundary

- Add podman-version.env with pinned apt version (5.0.2+ds1-4ubuntu1) for
  Renovate-friendly version management
- Source version pin in install step to prevent runner-image drift
- Export GIT_COMMIT/BUILD_UID/BUILD_GID to $GITHUB_ENV so docker-compose.yml
  image tag and user: directive resolve correctly
- Add actions/cache + podman save/load for dev image to avoid rebuilding on
  every CI run (keys on Containerfile/docker-compose.yml/conanfile.py hash)
- Add explicit podman-compose up -d dev + readiness poll (10×2s) so build
  failures abort loudly instead of surfacing as confusing exec errors
- Add cap_add: SYS_PTRACE and security_opt: seccomp:unconfined to dev service
  in docker-compose.yml for ASan/TSan sanitizer builds
- Stabilize dev image tag to :latest and container_name to projectkeystone-dev
- Wrap Configure CMake and Build with clang-tidy steps in
  podman-compose exec -T dev to maintain environment parity with make deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@mvillmow mvillmow left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOGO: missing conan profile detect breaks make deps in fresh container; cache-load skips rebuild without verifying :latest tag; fuzzy restore-keys + missing host sysctl for sanitizers.

Comment thread Makefile
Comment thread .github/actions/install-build-deps/action.yml
Comment thread .github/actions/install-build-deps/action.yml
Comment thread .github/actions/install-build-deps/action.yml
Comment thread .github/actions/install-build-deps/action.yml
mvillmow added a commit that referenced this pull request Jun 28, 2026
…s, lint boundary

- Add podman-version.env with pinned apt version (5.0.2+ds1-4ubuntu1) for
  Renovate-friendly version management
- Source version pin in install step to prevent runner-image drift
- Export GIT_COMMIT/BUILD_UID/BUILD_GID to $GITHUB_ENV so docker-compose.yml
  image tag and user: directive resolve correctly
- Add actions/cache + podman save/load for dev image to avoid rebuilding on
  every CI run (keys on Containerfile/docker-compose.yml/conanfile.py hash)
- Add explicit podman-compose up -d dev + readiness poll (10×2s) so build
  failures abort loudly instead of surfacing as confusing exec errors
- Add cap_add: SYS_PTRACE and security_opt: seccomp:unconfined to dev service
  in docker-compose.yml for ASan/TSan sanitizer builds
- Stabilize dev image tag to :latest and container_name to projectkeystone-dev
- Wrap Configure CMake and Build with clang-tidy steps in
  podman-compose exec -T dev to maintain environment parity with make deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow added a commit that referenced this pull request Jun 28, 2026
Addresses remaining self-review threads on the CI-migration action:

- Verify the cache-restored image carries the projectkeystone-dev:latest
  tag after `podman load`; rebuild if a stale/mistagged tarball loaded
  (so it can no longer silently fail at `podman-compose up`).
- Drop the broad `restore-keys: podman-` so a partial cache hit can no
  longer load a tarball built from a different Containerfile/conanfile;
  exact hashFiles key only, rebuild on any input change.
- Set vm.mmap_rnd_bits=28 on the runner host so in-container ASan/TSan/LSan
  do not abort with shadow-memory mapping errors on the noble kernel.
- Assert `podman info` reports rootless=true instead of merely printing it,
  so a rootful runner fails the step.
- Defensively run `conan profile detect --exist-ok` in `make deps` before
  `conan install` (the dev image already detects a profile at build time).

Refs #568

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow added a commit that referenced this pull request Jun 29, 2026
…s, lint boundary

- Add podman-version.env with pinned apt version (5.0.2+ds1-4ubuntu1) for
  Renovate-friendly version management
- Source version pin in install step to prevent runner-image drift
- Export GIT_COMMIT/BUILD_UID/BUILD_GID to $GITHUB_ENV so docker-compose.yml
  image tag and user: directive resolve correctly
- Add actions/cache + podman save/load for dev image to avoid rebuilding on
  every CI run (keys on Containerfile/docker-compose.yml/conanfile.py hash)
- Add explicit podman-compose up -d dev + readiness poll (10×2s) so build
  failures abort loudly instead of surfacing as confusing exec errors
- Add cap_add: SYS_PTRACE and security_opt: seccomp:unconfined to dev service
  in docker-compose.yml for ASan/TSan sanitizer builds
- Stabilize dev image tag to :latest and container_name to projectkeystone-dev
- Wrap Configure CMake and Build with clang-tidy steps in
  podman-compose exec -T dev to maintain environment parity with make deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow added a commit that referenced this pull request Jun 29, 2026
Addresses remaining self-review threads on the CI-migration action:

- Verify the cache-restored image carries the projectkeystone-dev:latest
  tag after `podman load`; rebuild if a stale/mistagged tarball loaded
  (so it can no longer silently fail at `podman-compose up`).
- Drop the broad `restore-keys: podman-` so a partial cache hit can no
  longer load a tarball built from a different Containerfile/conanfile;
  exact hashFiles key only, rebuild on any input change.
- Set vm.mmap_rnd_bits=28 on the runner host so in-container ASan/TSan/LSan
  do not abort with shadow-memory mapping errors on the noble kernel.
- Assert `podman info` reports rootless=true instead of merely printing it,
  so a rootful runner fails the step.
- Defensively run `conan profile detect --exist-ok` in `make deps` before
  `conan install` (the dev image already detects a profile at build time).

Refs #568

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow added a commit that referenced this pull request Jun 29, 2026
…stall-build-deps

The NATIVE->Podman migration forced every required job to build the dev
container from Containerfile (the cache key includes docker-compose.yml,
which #568 changed, so #568 always cache-misses and rebuilds). The rebuild
failed because 'pip install conan==2.0.0' can no longer build its pinned
PyYAML 5.x sdist under modern setuptools/Cython:

    AttributeError: 'build_ext' object has no attribute 'cython_sources'

This bricked the 'Install build dependencies' step of both lint and
coverage. Relax the pin to 'conan>=2.0,<3' (resolves to a current conan 2.x
with PyYAML 6.x), keeping the conan 2 major the conanfile.py API requires.
Verified the container build + 'conan profile detect' succeed on ubuntu:24.04.

Also replace the 'apt-cache madison podman >&2 || true' fallback diagnostic
with an explicit if-guard so the forbid-suppressions required check (which
rejects the '|| true' silent-failure idiom) passes without swallowing a real
non-zero exit.

Coverage floor is unchanged: scripts/generate_coverage.sh and the test set
are identical to main; coverage only failed because the container build did.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Micah Villmow <4211002+mvillmow@users.noreply.github.com>
mvillmow added a commit that referenced this pull request Jun 29, 2026
…ed in install-build-deps

The NATIVE->Podman migration forced every required job to build the dev
container from Containerfile (the cache key includes docker-compose.yml,
which #568 changed, so #568 always cache-misses and rebuilds). Three
defects bricked that path:

1. 'pip install conan==2.0.0' can no longer build its pinned PyYAML 5.x
   sdist under modern setuptools/Cython ('AttributeError: build_ext object
   has no attribute cython_sources'), failing the container build. Relax to
   'conan>=2.0,<3' (current conan 2.x + PyYAML 6.x), keeping the conan 2
   major conanfile.py requires. Verified build + 'conan profile detect'
   succeed on ubuntu:24.04.

2. After a successful build, 'podman save -o /tmp/dev-image.tar' aborts with
   'docker-archive doesn't support modifying existing images' (exit 125)
   when a stale tarball is present. Remove any prior archive before saving.
   Reproduced and verified the fix locally.

3. The 'apt-cache madison podman >&2 || true' fallback diagnostic tripped
   the forbid-suppressions required check. Replaced with an explicit
   if-guard that reports a non-zero exit instead of swallowing it.

Coverage floor is unchanged: scripts/generate_coverage.sh and the test set
are identical to main; coverage only failed because the container build did.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Micah Villmow <4211002+mvillmow@users.noreply.github.com>
mvillmow and others added 9 commits June 29, 2026 02:14
…501)

Drop the NATIVE=1 host-build bypass and the `%.native` Makefile pattern
rule so every build/test runs inside the Podman `dev` container. The
CI workflows (_required.yml, extras.yml, release-please.yml) now invoke
the container targets directly (make deps, make compile.debug,
make test.debug.asan, make benchmark, ...) instead of the removed
`.native` variants. Update the Makefile help text accordingly.

Rebased onto current main and reduced to the intended CI/build change
only — the agent layer and Python orchestration were extracted to
ProjectAgamemnon per ADR-015/016, so this PR carries no agent source
and no Python-CI changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
After removing the NATIVE=1 path, CI runs `make deps` then `make compile.X`.
`compile` runs cmake inside the `dev` container (CONTAINER_PREFIX) with
-DCMAKE_TOOLCHAIN_FILE=build/conan-deps/conan_toolchain.cmake, but `deps`
was still running `conan install` on the host. The host-generated toolchain
references the host's conan cache/compiler paths, which do not exist inside
the container, so the in-container cmake configure failed (exit 1) for the
coverage, benchmarks, release, and NATS-integration build jobs.

Run the conan installs through CONTAINER_PREFIX too so the toolchain and
packages are generated in the same container environment cmake builds in.
The repo is bind-mounted at /workspace, so build/conan-deps still lands in
the cached host path. Mirrors the previous native flow where deps.native
and compile.native both ran on the host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Add Podman setup to GitHub Actions build dependencies action:
- Install podman and podman-compose packages
- Start rootless Podman socket on GitHub Actions runners
- Set DOCKER_HOST env var for docker-compose CLI plugin compatibility
- Fix workspace permissions for Podman UID namespace mapping
- Verify Podman installation works

This fixes the issue where 'podman compose' was delegating to
docker-compose CLI plugin instead of using Podman's native compose
support, causing build failures in CI containers.

Addresses issue #501: Migrate CI from native builds to Podman containers.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Formatting changes from cmake-format hook to maintain code style
consistency across the project.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Replace 'podman compose' (which delegates to snap's docker-compose) with
'podman-compose' standalone tool for proper Podman integration in CI.

Add DOCKER_HOST environment variable support to Makefile rules to enable
rootless Podman socket connectivity in CI environments.

Fixes container startup failures when running 'make deps' and other
container-dependent targets in GitHub Actions runners.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Fix CodeQL warnings about unused template parameter 'args' in Logger::log()
by using if constexpr to conditionally log based on whether format arguments
are present. This resolves false positive static analysis warnings while
maintaining correct behavior for both zero-argument and variadic-argument
cases.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Apply clang-format fixes across the entire codebase and add SELinux
relabeling flags to docker-compose.yml volume mounts for rootless Podman.

Changes:
- Add :Z flag to volume mounts in dev and build services for proper
  SELinux context sharing with rootless Podman containers
- Apply clang-format to all C++ source files to pass CI linting checks

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
…s, lint boundary

- Add podman-version.env with pinned apt version (5.0.2+ds1-4ubuntu1) for
  Renovate-friendly version management
- Source version pin in install step to prevent runner-image drift
- Export GIT_COMMIT/BUILD_UID/BUILD_GID to $GITHUB_ENV so docker-compose.yml
  image tag and user: directive resolve correctly
- Add actions/cache + podman save/load for dev image to avoid rebuilding on
  every CI run (keys on Containerfile/docker-compose.yml/conanfile.py hash)
- Add explicit podman-compose up -d dev + readiness poll (10×2s) so build
  failures abort loudly instead of surfacing as confusing exec errors
- Add cap_add: SYS_PTRACE and security_opt: seccomp:unconfined to dev service
  in docker-compose.yml for ASan/TSan sanitizer builds
- Stabilize dev image tag to :latest and container_name to projectkeystone-dev
- Wrap Configure CMake and Build with clang-tidy steps in
  podman-compose exec -T dev to maintain environment parity with make deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
Remove NATIVE=1 path; migrate all CI to Podman containers

Closes #501

Implemented-By: claude-sonnet-4-6
Co-Authored-By: Claude Code <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
mvillmow and others added 4 commits June 29, 2026 02:14
Addresses remaining self-review threads on the CI-migration action:

- Verify the cache-restored image carries the projectkeystone-dev:latest
  tag after `podman load`; rebuild if a stale/mistagged tarball loaded
  (so it can no longer silently fail at `podman-compose up`).
- Drop the broad `restore-keys: podman-` so a partial cache hit can no
  longer load a tarball built from a different Containerfile/conanfile;
  exact hashFiles key only, rebuild on any input change.
- Set vm.mmap_rnd_bits=28 on the runner host so in-container ASan/TSan/LSan
  do not abort with shadow-memory mapping errors on the noble kernel.
- Assert `podman info` reports rootless=true instead of merely printing it,
  so a rootful runner fails the step.
- Defensively run `conan profile detect --exist-ok` in `make deps` before
  `conan install` (the dev image already detects a profile at build time).

Refs #568

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
The ubuntu-24.04 runner image rotated podman 5.0.2+ds1-4ubuntu1 out of
its apt repo, so `apt-get install podman=${PODMAN_APT_VERSION}` failed
with "Version '...' for 'podman' was not found" (exit 100), turning the
required coverage/build jobs red. Keep the reproducibility pin when the
exact version is present, but fall back to the latest available podman
when apt has rotated it out, so an upstream repo change cannot
hard-break required CI.

Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
…ed in install-build-deps

The NATIVE->Podman migration forced every required job to build the dev
container from Containerfile (the cache key includes docker-compose.yml,
which #568 changed, so #568 always cache-misses and rebuilds). Three
defects bricked that path:

1. 'pip install conan==2.0.0' can no longer build its pinned PyYAML 5.x
   sdist under modern setuptools/Cython ('AttributeError: build_ext object
   has no attribute cython_sources'), failing the container build. Relax to
   'conan>=2.0,<3' (current conan 2.x + PyYAML 6.x), keeping the conan 2
   major conanfile.py requires. Verified build + 'conan profile detect'
   succeed on ubuntu:24.04.

2. After a successful build, 'podman save -o /tmp/dev-image.tar' aborts with
   'docker-archive doesn't support modifying existing images' (exit 125)
   when a stale tarball is present. Remove any prior archive before saving.
   Reproduced and verified the fix locally.

3. The 'apt-cache madison podman >&2 || true' fallback diagnostic tripped
   the forbid-suppressions required check. Replaced with an explicit
   if-guard that reports a non-zero exit instead of swallowing it.

Coverage floor is unchanged: scripts/generate_coverage.sh and the test set
are identical to main; coverage only failed because the container build did.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Micah Villmow <4211002+mvillmow@users.noreply.github.com>
The coverage and lint required jobs failed with a rootless-Podman
bind-mount ownership mismatch, not a code defect:

  coverage: mkdir: cannot create directory 'build/x86.coverage.debug':
            Permission denied   (Makefile:79, host-side mkdir)
  lint:     CMake Error: Unable to (re)create the private pkgRedirects
            directory          (in-container cmake configure)

Both are the same bug. The dev service ran as user "${BUILD_UID}" under
the default rootless userns, which maps that uid to a host *subuid*
(e.g. 1001 -> 101000). Anything the container wrote under the mounted
workspace (build/conan-deps from 'make deps', etc.) became owned by that
foreign subuid on the host, so the host runner could no longer create
build/x86.coverage.debug, and an in-container cmake configure could no
longer recreate build/.

Add 'userns_mode: keep-id' to the dev (and build) services. keep-id maps
the host runner uid 1:1 into the container, so the host and the
in-container build user share ownership of the bind mount. Verified
locally with podman 5.8.3 / podman-compose 1.5.0: with keep-id and
user=$(id -u) (exactly what install-build-deps sets BUILD_UID to),
both host-side and in-container mkdir under build/ succeed.

Signed-off-by: mvillmow <4211002+mvillmow@users.noreply.github.com>
@mvillmow

Copy link
Copy Markdown
Collaborator Author

Needs a design decision — CMake container/host path mismatch

Automated sweep got this PR through two layers of CI-migration bugs, but the remaining failure is architectural and needs your call as the migration author.

Progress made (pushed, signed):

Remaining blocker (required lint + coverage): the migration runs CMake inconsistently against the shared bind-mounted build/ tree:

  • lint (clang-tidy) configures CMake inside the container → cache stamped /workspace/build/x86.debug.clang-tidy/...
  • coverage configures/reads CMake on the host → expects /home/runner/work/ProjectKeystone/ProjectKeystone/build/...

Because both reuse the same build/ dir, the second hits:

CMake Error: The current CMakeCache.txt directory /workspace/build/... is different than
the directory /home/runner/work/.../build/... where CMakeCache.txt was created.

Decision needed: pick one build-isolation model for the migration —

  1. Run all CMake/build steps inside the container (so every path is /workspace/...), or
  2. Keep them on the host, or
  3. Use separate build dirs per context (e.g. build/x86.debug.clang-tidy in-container vs a distinct host dir) and don't share a CMakeCache across the container boundary.

This is left open and disarmed-of-urgency for your decision; the rest of the ecosystem PR sweep is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove NATIVE=1 path; migrate all CI to Podman containers

2 participants